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Abstract — We address the problem of controlling a noisy 
differential drive mobile robot such that the probability of 
satisfying a specification given as a Bounded Linear Temporal 
Logic (BLTL) formula over a set of properties at the regions 
in the environment is maximized. We assume that the vehicle 
can determine its precise initial position in a known map of 
the environment. However, inspired by practical limitations, 
we assume that the vehicle is equipped with noisy actuators 
and, during its motion in the environment, it can only measure 
the angular velocity of its wheels using limited accuracy 
incremental encoders. Assuming the duration of the motion is 
finite, we map the measurements to a Markov Decision Process 
(MDP). We use recent results in Statistical Model Checking 
(SMC) to obtain an MDP control policy that maximizes the 
probability of satisfaction. We translate this policy to a vehicle 
feedback control strategy and show that the probability that the 
vehicle satisfies the specification in the environment is bounded 
from below by the probability of satisfying the specification 
on the MDP. We illustrate our method with simulations and 
experimental results. 

I. Introduction 

Robot motion planning and control has been widely stud- 
ied in the last twenty years. Recently, temporal logics, such 
as Linear Temporal Logic (LTL) and Computational Tree 
Logic (CTL) have become increasingly popular for spec- 
ifying robotic tasks (see, for example [KGFP07], [KF08], 
[KB08b], [WTM09]). It has been shown that temporal logics 
can serve as rich languages capable of specifying complex 
motion missions such as "go to region A and avoid region B 
unless regions C or D are visited". 

In order to use existing model checking tools for motion 
planning (see [BK08]), many of the above-mentioned works 
rely on the assumption that the motion of the vehicle in 
the environment can be modeled as a finite system [CGP99] 
that is either deterministic (applying an available action 
triggers a unique transition [KB 08b]) or nondeterministic 
(applying an available action can enable multiple transitions, 
with no information on their likelihoods [KB08a]). Recent 
results show that, if sensor and actuator noise models can 
be obtained from empirical measurements or an accurate 
simulator, then the robot motion can be modeled as a Markov 
Decision Process (MDP), and probabilistic temporal logics, 
such as Probabilistic CTL (PCTL) and Probabilistic LTL 
(PLTL), can be used for motion planning and control (see 
[LAB 12]). 
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However, robot dynamics are normally described by con- 
trol systems with state and control variables evaluated over 
infinite domains. A widely used approach for temporal 
logic verification and control of such a system is through 
the construction of a finite abstraction ([TP06], [Gir07], 
[YTC+12]). Even though recent works discuss the construc- 
tion of abstractions for stochastic systems [JP09], [ADBS08], 
the existing methods are either not applicable to robot 
dynamics or are computationally infeasible given the size 
of the problem in most robotic applications. 

In this paper, we consider a vehicle whose performance 
is measured by the completion of time constrained temporal 
logic tasks. In particular, we provide a conservative solu- 
tion to the problem of controlling a stochastic differential 
drive mobile robot such that the probability of satisfying 
a specification given as a Bounded Linear Temporal Logic 
(BLTL) formula over a set of properties at the regions 
in the environment is maximized. Inspired by a realistic 
scenario of an indoor vehicle leaving its charging station, 
we assume that the vehicle can determine its precise initial 
position in a known map of the environment. The actuator 
noise is modeled as a random variable with a continuous 
probability distribution supported on a bounded interval, 
where the distribution is obtained through experimental trials. 
Also, we assume that the vehicle is equipped with two 
limited accuracy incremental encoders, each measuring the 
angular velocity of one of the wheels, as the only means of 
measurement available. 

Assuming the duration of the motion is finite, through dis- 
cretization, we map the incremental encoder measurements 
to an MDP. By relating the MDP to the vehicle motion in the 
environment, the vehicle control problem becomes equivalent 
to the problem of finding a control policy for an MDP 
such that the probability of satisfying the BLTL formula is 
maximized. Due to the size of the MDP, finding the exact so- 
lution is computationally too expensive. Therefore, we trade- 
off correctness for scalability and we use computationally 
efficient techniques based on system sampling. Specifically, 
we use recent results in Statistical Model Checking (SMC) 
for MDPs ([HMZ+12]) to obtain an MDP control policy and 
a Bayesian Interval Estimation (BIE) algorithm ([ZPC10]) 
to estimate the probability of satisfying the specification. 
Finally, we show that the probability that the vehicle satisfies 
the specification in the original environment is bounded 
from below by the maximum probability of satisfying the 
specification on the MDP under the obtained control policy. 

The main contribution of this work lies in bridging the 
gap between a low level sensory inputs and a high level 



temporal logic specifications. We develop a framework for 
the synthesis of a vehicle feedback control strategy from such 
specifications based on a realistic model of an incremental 
encoder. This paper extends our previous work ([CB12]) of 
controlling a stochastic version of Dubins vehicle such that 
the probability of satisfying a temporal logic statement, given 
as a PCTL formula, over some environmental properties, is 
maximized. Specifically, the approach presented here allows 
for richer temporal logic specifications, where the vehicle 
performance is measured by the completion of time con- 
strained temporal logic tasks. In [HMZ+12], the authors 
use SMC for MDPs to solve a motion planning problem 
for a vehicle moving on a finite grid and knowing its state 
precisely, at all times, when the task is given as a BLTL 
formula. We adopt this approach to control a vehicle with 
continuous dynamics and allowing for uncertainty in its state. 

The remainder of the paper is organized as follows. In 
Sec. |n| we introduce the necessary notation and review 
some preliminaries. We formulate the problem and outline 

|VII| we explain the 



the approach in Sec. [TTTJ In Sec. [TV] 
construction of the MDP and the relation between the MDP 
and the motion of the vehicle in the environment. The vehicle 
control policy is obtained in Sec. VIII Case studies and 
experimental results illustrating our approach are presented 
in Sec. (Tx] We conclude with final remarks and directions 
for future work in Sec. |X) 

II. Preliminaries 

In this section we provide a short and informal intro- 
duction to Markov Decision Processes (MDP) and Bounded 
Linear Temporal Logic (BLTL). For details about MDPs the 
reader is referred to [BK08] and [CGP99], and for more 
information about BLTL to [JCL+09] and [ZPC10]. 

Definition 1 (MDP): A Markov Decision Process (MPD) 
is a tuple M = (S, so, Act, A, P), where S is a finite set of 
states; so G S is the initial state; Act is a finite set of actions; 
A : S — >> 2 Act is a function specifying the enabled actions at 
a state s\ P : S x Act x S — » [0, 1] is a transition probability 
function such that for all states s G S and actions a G A(s): 
Hs'esP( s i a i s ') = 1> an d for all actions a £A(s) and s f G S\ 
P(s,a : s')=0; 

A control policy for an MDP resolves nondeterminism in 
each state s by providing a distribution over the set of actions 
enabled in s. 

Definition 2 (MDP Control Policy): A control policy jl 
of an MDP M is a function fi(s,a) : S x Act — » [0,1], s.t, 
Y<aeA(s) MO 5 ?**) = 1 an d V>(s,a) > only if a is enabled in s. 
A control policy for which either ji(s,a) = 1 or ji(s,a) = 
for all pairs (s,a) G S x Act is called deterministic. 

We employ Bounded Linear Temporal Logic (BLTL) to 
describe high level motion specifications. BLTL is a variant 
of Linear Temporal Logic (LTL) ([BK08]) which requires 
only paths of bounded size. A detailed description of the 
syntax and semantics of BLTL is beyond the scope of this 
paper and can be found in [JCL + 09] and [ZPC10]. Roughly, 
formulas of BLTL are constructed by connecting properties 
from a set of proposition IT using Boolean operators (-> 



(negation), A (conjunction), V (disjunction)), and temporal 
operators (U- ? (bounded until), F- ? (bounded finally), and 
G- r (bounded globally), where t G R-° is the time bound 
parameter). The semantics of BLTL formulas are given over 
infinite traces a = (oi,ti)(o 2 ,t 2 ) . . o t G 2 n , t t G R-°, i > 1, 
where 0{ is the set of satisfied propositions and t{ is the time 
spent satisfying 0[. A trace satisfies a BLTL formula (f) if is 
true at the first position of the trace; F-*0i means that (j>\ will 
be true within t time units; G- r 0i means that <j>\ will remain 
true for the next t time units; and 0iU- r 02 means that (j>2 
will be true within the next t time units and 0i remains true 
until then. More expressivity can be achieved by combining 
the above temporal and Boolean operators. 

III. Problem Formulation and Approach 

A. Problem Formulation 

A differential drive mobile robot ([LaV06]) is a vehicle 
having two main wheels, each of which is attached to its 
own motor, and a third wheel which passively rolls along 
preventing the robot from falling over. In this paper, we 
consider a stochastic version of a differential drive mobile 
robot, which captures actuator noise: 
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where (x,y) G M 2 and 6 G [0,27r) are the position and 
orientation of the vehicle in a world frame, u r and u\ are 
the control inputs directly specifying the two desired angular 
wheel velocities, U r and U\ are the finite control constraint 
sets, and £ r and £/ are random variables modeling the actuator 
noise with continuous probability density functions sup- 
ported on the bounded intervals [e™ in ,£™ ax \ and [ef™ ,ef ax ], 
respectively. L is the distance between the two wheels and 
r is the wheel radius. We denote the state of the system by 
q=[x,y,e] T eSE(2). 

Motivated by the fact that the time optimal trajectories for 
the bounded velocity differential drive robots are composed 
only of turns in place and straight lines ([BM00]), we assume 
U r and U\ are finite, but we make no assumptions on the 
optimality. We define 



Wi = {u + £\u G Ui,£ G [ef 
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as the sets of applied control inputs, i.e, the sets of angular 
wheel velocities that are applied to the system in the presence 
of noise. We assume that time is uniformly discretized 
(partitioned) into stages (intervals) of length At, where stage 
k is from (k-l)At to kAt. The duration of the motion is 
finite and it is denoted KAt (later in this section we explain 
how K is determined). We denote the control inputs and the 
applied control inputs at stage k as u\ G Uu i G {r, /}, and 
w\ EWi, i G {r,/}, respectively. 

We assume that the vehicle is equipped with two in- 
cremental encoders, each measuring the angular velocity 
of one of the wheels. When using incremental encoders, 
the angular velocity is considered constant inside the given 



observation stage (see [PTPZ07] and [EECK07]). Thus, the 
applied controls are considered piecewise constant, i.e., w l • : 
[(k— l)At,kAt] — > Wi, i G {rj}, are constant over each stage. 

Incremental encoder model: As shown in [PTPZ07], the 
measurement resolution of an incremental encoder is con- 
stant and for encoder i we denote it as Ae i5 i G {r,/}. Given 
A£ ; and [ef^ef 1 **], / G {r,/}, then the following holds: 



3n t G Z + s.t. mAEi = \e" 



i G {/",/}. For more details 



see Sec. IX where we also explain how to obtain the mea- 
surement resolutions and the probability density functions. 
Then, [£™ in , can be partitionecQinto noise intervals of 
length A^: [ef ,ef ], = 1 , . . . , h,-, / G {r, /}. We denote the set 
of all noise intervals % = {[£•,£•], . . . , fef, i G {r,/}. At 
stage k, if the applied control input is + £;, the incremental 
encoder i will return measured interval 

[wf,wf] = [«/+&,«/ +e/], 

where G G i G {r,/}. In Fig. 1 we give 

an example. The pair of measured intervals at stage k, 
[wf,wf]), returned by the incremental encoders, is 
denoted W*. 
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Fig. 1. Let n r = 3, i.e, [e™ m ,£™ ajc ] is partitioned into 3 noise intervals of 
length Ae r , <f r = {[ej,ej], [e?,e^]}. Assume the applied control input 

at stage k is u k r +£ r , such that e r G [e},£^]. Then, the incremental encoder 
r, at stage will return measured interval [wj,wj] 



[u k r + £$,U k r + £ 2 r ]. 



The vehicle moves in a planar environment in which a set 
of non-overlapping regions of interest, denoted R, is present. 
Let n be the set of propositions satisfied at the regions in 
the environment. One of these propositions, denoted by 7t u G 
n, signifies that the corresponding regions are unsafe. In 
this work, the motion specification is expressed as a BLTL 
formula over II: 

= -7T M U^ {(pi A -7T M U^ 2 (92 A ... A -n^U^/p/)), (2) 

/ G Z + , and (jpy, Vj G {1, . . . ,/}, is of the following form: 

<Pj = G^k V *)V...VG^" J ( V 



where nj € Z + , 
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Consider the environment shown in Fig. 
[5] Let II = {7^,7^,7^,%}, where 7^,7^,;^,% label the 
unsafe, pick-up, test and the drop-off regions, 
respectively. Let the motion specification be as follows: 



Start from an initial state q^ and reach a pick-up 
region within T\ time units to pick up a load. After entering 
the pick-up region reach a test region within T2 time 
units and stay in it at least T2 time units. Finally, after 
entering the test region reach a drop-off region within 
T3 time units to drop off the load. Always avoid the unsafe 
regions. 

The specification translates to BLTL formula (j): 
$ = ^n u V^(n p A^n u V^ T2 (G T2 n t A^n u F^n d )). (3) 
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Fig. 2. An example environment with the regions of interest. The unsafe, 
pick-up, test and the drop-off regions are shown in red, blue, cyan 
and green, respectively. A sample state (position) trajectory of the system 
is shown in magenta. 

We assume that the vehicle can precisely determine its 
initial state = [xi n i t ,yi n it, Qinit], m a known map of the 
environment. While the vehicle moves, incremental encoder 
measurements are available at each stage k. We define 
a vehicle control strategy as a map that takes as input a 
sequence of pairs of measured intervals W : W 2 . . . W^ _1 , and 
returns control inputs u k r G U r and u\ G £// at stage k. We are 
ready to formulate the main problem we consider in this 
paper: 

Problem 1: Given a set of regions of interest R satisfying 
propositions from a set n, a vehicle model described by Eqn. 
([T]) with initial state qi n i U a motion specification expressed 
as a BLTL formula (j) over n (Eqn. ([2])), find a vehicle 
control strategy that maximizes the probability of satisfying 
the specification. 

To fully specify Problem [T] we need to define the satisfac- 
tion of a BLTL formula by a trajectory q : [0,^TAf] — » SE(2) 
of the system from Eqn. ([T]). Formal definition is given 
in Sec. |lV] Informally, q(f) produces a finite trace a = 
(o\,ti)(o2,t2)...(ot,ti), Oi GriUO, U GlR- , i> 1, where o t 
is the satisfied proposition^] and t\ is the time spent satisfying 

as time evolves. A trajectory q(t) satisfies BLTL formula 
(j) if and only if the generated trace satisfies the formula. 
Given 0, for the duration of the motion we use the smallest 
K G Z + for which model checking a trace is well defined, 
i.e., the smallest K for which the maximum nested sum of 
time bounds (see [ZPC10]) is at most KAt. 



throughout the paper, we relax the notion of partition by allowing the 
endpoints of the intervals to overlap. 



2 Since the regions of interest are non-overlapping it follows that 0{ G 

nu©. 



B. Approach 

In this paper, we develop a suboptimal solution to Problem 
[T] consisting of three steps. First, we define a finite state 
MDP that captures every sequence realization of pairs of 
measurements returned by the incremental encoders. States 
of the MDP correspond to the sequences of pairs of measured 
intervals and the actions correspond to the control inputs. 

Second, we find a control policy for the MDP that 
maximizes the probability of satisfying BLTL formula (j). 
Because of the size of the MDP, finding the exact solution 
is computationally too expensive. We decided to trade- 
off correctness for scalability and we use computationally 
efficient technique based on system sampling. We use recent 
results in SMC for MDPs ([HMZ+12]) to obtain an MDP 
control policy and BIE algorithm ([ZPC10]) to estimate the 
probability of satisfying (j). 

Finally, since each state of the MDP corresponds to a 
unique sequence of pairs of measured intervals, we translate 
the control policy to a vehicle control strategy. In addition, 
we show that the probability of satisfying (j), in the original 
environment, is bounded from below by the probability of 
satisfying the specification on the MDP under the obtained 
control policy. 

IV. Generating a trace 

In this section we explain how, given a state trajectory 
the corresponding trace is generated. Let us denote [n] = 
{(x,y) G M 2 |(x,v) G U re R n r} as the set of positions that 
satisfy proposition 7T, where R n C R is the set of regions 
labeled with proposition n. 

Definition 3 (Generating a trace): The trace correspond- 
ing to a state trajectory q(t) = [x(t),y(t), 0(t)] T is a finite se- 
quence o=(oi,ti)(o 2 ,t 2 )...(oi,ti), 0fGllU0, t t e [0,KAt], 
i = 1, ...,/,/> 1, where oi is the satisfied proposition and 
t{ is the time spent satisfying ou generated according to the 
following rules, for all t^t'^T G [0,KAt]: 

• \ = n G n iff (x(0),y(0)) G [n] and o\ = otherwise. 

• Let 0[ be the satisfied proposition at some t. Then: 

1) If Oi = 0, then oi+i = % G IT, iff (i) 3t f > t s.t. 
W),y(t')) G [7T], and (ii) $T G [t/t] s.t. (x(t),v(t)) G 
[tz% Vtt' g n and n = min^-i^. KAt] {t\(x(t) ,y(t)) G 

2) If ot = 71 G n, then o M = iff 
3t' > t s.t. (x(t'),y(t')) £ [k], and t { = 

^eEpjj^^OIWO.yW) i M}-E£o*/. with 

to = 0. J 

• Let for KAt, o\ be the current satisfied propositions. Then, 
t^KAt-tjlltj. 

A trajectory q(t) satisfies BLTL formula (j) (Eqn. ^) if 
and only if the trace generated according to the rules stated 
above satisfies the formula. Note that, since the duration of 
the motion is finite, the generated trace is also finite. In 
[ZPC10] the authors show that BLTL requires only traces 
of bounded lengths. The fact that the trace a satisfies (j) 
is denoted a N (f). Given a trace <7, the z-th state of <7, 
denoted Oi, is (o/,^), i = 1, . . . ,/. We denote o\i as the finite 



subsequence of o that starts in a*. Finally, given a formula (j), 
we denote subformula ^7T u l] T j(pj as <j>j, j = 1,...,/. Using 
the BLTL semantics one can derive the following conditions 
to determine whether a N (j) : 

Definition 4 (Satisfaction conditions): Given a trace a 
and a BLTL formula (j) (Eqn. let for j G {1, . . . ,/}, 
ijjkj G N be such that for some n G {1, . . . the following 
holds: 

1) "i, ■ a, c n«. 

2) for each z ; - < i < ij + kj, oi ^ 7t u , 

3) E^'~\-<7),and 

4) **,.+*, > t;. 

Then, cr| N y . If V/ G {1, . . . ,/}, 3i h kj G N s.t. N 0; 
where = ij + kj with z'i = 1, then a N 0. 

Example 2: Consider the environment and the sam- 
ple state (position) trajectory shown in Fig. 2. Let (j) 
be as in Eq. ^ with the following numerical val- 
ues for the time bounds: T\ = 6.2, T 2 = 2.3, T 2 = 
0.2, and T3 = 2.3. The trajectory generates trace a = 
(0,6.12)(^,O.75)(0,O.44)(^,O.61)(0,1.66)(%,1.22). The 
following holds: c|i N 0i since for z"i = 1 and k\ = 1, 
6>2 G {^Tp}, 01 7^ 7T M , < 7\; cr|2 N 02 since for i 2 = 2 and 
&2 = 2, 04 G {Tit}, o 2 ,03 ^ 7C U , t 2 + t3 < T 2 and U > % 2 \ and 
o\4 N 03 since for i$ = 4 and £3 = 2, 6>6 G {%}, #4,05 ^ 7T M 
and ?4 + ?5 < 73; Thus, cr N (j). 

V. Construction of an MDP Model 

Recall that £; is a random variable with a continuous prob- 
ability density function supported on the bounded interval 
[ g mw jg ffKw] 5 f e {r,Z}. The probability density functions are 
obtained through experimental trials (see Sec. IX) and they 
are defined as follows: 

Pr(£ / G[8f,8f])= j pf, (4) 

[£f,£f] eS u ji = 1,...,*/, s.t. L-:=i/^ = 1, ie{r,/}. 

An MDP M that captures every sequence realization of 
pairs of measurements returned by the incremental encoders 
is defined as a tuple (S,so,Act,A,P), where: 

• S = U k= i y , ^ K {([u r + £ n U r ^£ r ]:[ui + £ h Ui -\-£i])\u r G 

U n u t G U h [£ n £ r ] G § r , [£ h £i] G The meaning of 
the state is as follows: (W 1 , . . . , W*) G 5 1 , means that at 
stage i, I <i <k, the pair of measured intervals is W l . 

• ^0 = is the initial state. 

• Act = {U r xUi}U(p is the set of actions, where (p is a 
dummy action. 

• A : S — )> 2 Ac? gives the enabled actions at state .s 1 : if |s| = 
^T, i.e., if the termination time is reached, A(s) = <p, 
otherwise A(s) = {U r x Ui} . 

• P .Sx Act x S — » [0, 1] is a transition probability function 
constructed by the following rules: 

1) If j = (W 1 , . . . , W*) G S then P(j,a, j 7 ) = iff 
sf = (W 1 ,...,W^,([ Mr + ^, Mr + e^],[ M/ + ^, M/ + 
e"])) G S and a = [u r ,ui) G {C/ r x C//} where m = 
1 , . . . , n r and n = 1 , . . . , ^/ ; 

2) If |s| = ^ then ^) = 1 iff (3 = (p and ^ = s; 

3) P(s,a,s f ) = otherwise. 



Rule 1) defined above follows from the fact that given u k 
and u k as the control inputs at stage k, the pair of measured 
intervals at stage k + 1 is ([u k + ej?, u k + £™] , [wf + ej 1 , + 
e?]) with probability p?rf, since Pr(e r G [e^e?]) = ^ and 
Pr(e/ G [£",£"]) = which follows from Eqn. (see the 
MDP fragment in Fig. 3). Rule 2) states that if the length of 
s is equal to K, i.e., if the termination time is reached, then 
A(s) (p with P(s,a,s) = 1. 
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Fig. 3. A fragment of the MDP M where n r = n x = 2. Thus, p™ = 
Pr(e r G [e?,e?]), form =1,2, and p» = Pr(e z G [£?,£?]), for n = 1,2. Action 
(u n ui) eA(j) enables four transitions. For example, given state 5 = (W 1 ), 
the new state is (W X W^), where = ([w r -ej,w r + ej], [w z -ef ,«/ + £?]), 
with probability /?J -pf. This corresponds to applied control inputs being 
equal to u r + e r and !*/ + £/ where e r G and £/ G [ef ,ef]. 



Proposition 1: The model M defined above is a valid 
MDP, i.e., it satisfies the Markov property and P is a 
transition probability function. 

Proof: The proof follows from construction of P. Given 
current state s G S and an action a eA(s), the conditional 
probability distribution of future states depends only on the 
current state s, not on the sequences of events that preceded 
it (see rule 1) above). Thus, the Markov property holds. In 
addition, since for every s and a e A(s): Y, s 'esP( s i a i sf ) = 
C=i C=i P?P" = C=i PFlZLi Pi = 1. ^ follows that P is 
a valid transition probability function. ■ 

VI. Position uncertainty 
A. Nominal state trajectory 

For each interval belonging to the set of noise intervals 
Su we define a representative value ep = (ef -\-ef)/2, 
ji = 1, . . . ,ni, / G {r,/}, i.e., ej* is the midpoint of interval 
[sf ,ef ] G <^-, / G {r,/}. We denote the set of representative 
values as E\ = {e/ , . . . , e"'}, i G {r, /}. 

We use q k (t), w k and nf , t G [(Jfc- l)Af,JfcAf], k = 1, . . . 
to denote the state trajectory and the constant applied controls 
at stage k, respectively. With a slight abuse of notation, we 
use q k to denote the end of state trajectory q k (t), i.e., q k = 
q k (kAt). Given state q k ~ l , the state trajectory q k {t) can be 
derived by integrating the system given by Eqn. ([T]) from the 
initial state q k ~ l , and taking into account the applied controls 
are constant and equal to w k and w k . Throughout the paper, 
we will also denote this trajectory by q k (q k ~ 1 : w k ,w k : t), 



when we want to explicitly capture the initial state q k ~ l and 
the constant applied controls w k and w k . 
Given a path through the MDP: 

So > Si > S 2 • • .S K -1 > S K , (5) 

where s k = (W 1 , . . . , W*), with W* = ( [u k + £*, u k + £ k r ] , [u k + 
£ h u k + £\\), k=l,...,K, we define the nominal state tra- 
jectory q(t), t G [0,KAt], as follows: 

q(t) = q k (q k ~\u k + £ k ,u I { + £^t), t G [(k- l)At,At], 

k= 1,...,K, where £ k G E t is such that fif G [ef , ef], i G {r, /} 
and g° = • For every path through the MDP, its nominal 
state trajectory is well defined. The next step is to define 
the uncertainty evolution, along the nominal state trajectory, 
since the applied controls can take any value within the 
measured intervals. 

B. Position uncertainty evolution 

Since a motion specification is a statement about the 
propositions satisfied by the regions of interest in the en- 
vironment, in order to answer whether some state trajectory 
satisfies BLTL formula it is sufficient to know its projection 
in R 2 . Therefore, we focus only on the position uncertainty. 

The position uncertainty of the vehicle when its nominal 
position is (x,y) G R 2 is modeled as a disc centered at (x,y) 
with radius d G R, where d denotes the distance uncertainty: 

D((x,y),d) = {(xy)eR 2 \\\(x,y),(x\y , )\\<d}, (6) 

where 1 1 • 1 1 denotes the Euclidian distance. Next, we explain 
how to obtain d. 

First, let AO G S l denote the orientation uncertainty. Let 
q(t),te [0,KAt], be the nominal state trajectory correspond- 
ing to a path through the MDP (Eqn. (BJ). Then, q(t) can 
be partitioned into K state trajectories: q\t) = q k (q k ~ l ,u k + 
£ k ,u\ + £ k ,t), t G [(k-l)At,At], k= 1,...,K, where £ k G E[ 
is such that ef G [if ,ef] G $u i G {r, /} and q° = q init (see 
Fig. 4). The distance and orientation uncertainty at state q k 
are denoted as d k and A6 k , respectively. We set d k and A6 k 
at state q k = [x k ,y k ,6 k ] T equal to: 

d k = max W y^ ] T e ^ k (\\(x k : y k ),(x\y f )\\)^d k - 1 and 
A6 k = max [y y j0/] r G ^ ( 1 k -6 f \), 

where 

M k = {q\[x k -\y k -\ 6 k ~ l + a] r , u k + ^, + e/, jfeAf ) | 
a g {Ae^ 1 ,-Afl*- 1 },^ G {e*,e*},^ G {ef,ef}}, 

for k = 1, . . . where J° = and A0° 0. 

Eqn. (|7]) and ([5]) are obtained using a worst scenario 
assumption. At stage k, the pair of measured intervals is 
W* = ([u k ^ £ k ,u k ^£%[u k £ h u k ^£^}) and we use the 
endpoints of the measured intervals to define set M k . & k 
is the smallest set of points in SE(2), at the end of stage k, 
guaranteed to contain (i) the state with the maximum distance 
(in Euclidian sense) from q k given that the applied controls 
at stage / are within the measured intervals at stage i, and (ii) 
the state with the maximum orientation difference compared 



(7) 



(8) 




Fig. 4. Left: Evolution of the position uncertainty along the nominal 
state trajectory q(t) = [x(t),y(t),Q(t)], where q(t) is partitioned into 3 
state trajectories, q k (t), k = 1,2,3. Right: The conservative approximation 
of region D((x(t), y(t)),d(t)) along q(t), where the distance uncertainty 
trajectory is d{t') = d k (t), t' E p- l)Af,£Af], where d k (t) = d k , k = 1,2,3. 



to q k given that the applied controls at stage i are within the 
measured intervals at stage i, i = 1 , . . . , k. (for more details 
about 3% k see [FMAG98]). An example si given in Fig. 4. 

From Eqn. ^ and ^ it follows that, given a nominal 
state trajectory q(t), t G [0,KAt], the distance uncertainty 
increases as a function of time. The way it changes along 
q(t) makes it difficult to characterize the exact shape of the 
position uncertainty region. Instead, we use a conservative 
approximation of the region. We define d : [0,^TAf] — » R as 
an approximate distance uncertainty trajectory and we set 
d(t) =d k , t G [(k-l)At,kAt], k=l,...,K, i.e., we set the 
distance uncertainty along the state trajectory q k (t) equal to 
the maximum value of the distance uncertainty along q k (t), 
which is at state q k . An example illustrating this idea is given 
in Fig. 4. 

Proposition 2: Given a path through the MDP M (Eqn. 
(5)), and the corresponding q(t) and d(t), t G [0,KAt], as 
defined above, then any state trajectory q'(t) = q k (q k ~ l ,u k + 

irf + ef ,t), t G [(k- l)At,kAt], k=l,...,K, where q° = 
Qinit, £ r e fer^r] anc * £ f e is within the uncertainty 

region, i.e., (V (*),/(*)) e D((x(t),y(t)),d(t)) 9 Vr G [0,#Af]. 
Proof: The proof follows from the definition of the approx- 
imate distance uncertainty trajectory and Eqn. ([6]), ([7]) and 

VII. Generating a trace under the position 

UNCERTAINTY 

Let q(t) be a nominal state trajectory with the distance 
uncertainty trajectory d(t), t G [0,KAt]. In this subsec- 
tion we introduce a set of conservative rules according to 
which the trace corresponding to the uncertainty region 
D((x(t),y(t)),d(t)) is generated. This rules guarantee that 
if the generated trace satisfies (j) (Eqn. (2)) then any state 
(position) trajectory, inside D((x(t),y(t)),d(t)), will satisfy 



Definition 5 (Generating a trace under uncertainty): 
The trace corresponding to an uncertainty region 
D((x(t),y(t)),d(t)) is a finite sequence a = 
(o h ti)(o 2 ,t 2 )---i(o h ti), o/Gnu0, t t e [0,KAt], i= 1,...,/, 
/ > 1, where oi is the satisfied proposition and t\ is the time 
spent satisfying ou generated according to the following 
rules, for all f,f',T E [0,KAt]: 

• oi=7teU\7Cu iff D((*(0),y(0),</(0)) C [ti], o x = % u iff 
D((jc(0),y(0),d(0)) H [nj ^ and = otherwise. 

• Let be the satisfied proposition at some t. Then: 

1) If o/ = n G n \ tt u , then = iff 3t' > t 
s.t. D((x(t'),y(t')),d(t')) % W and = 

with U) = 0. 

2) If of = ;r M , then = iff 3f' > t s.t. 

D((^0,y(O)^(O) n [tt m ] = and = 

min ^Epi o o-,^]{^(( x (0^(0)^(0) n [tt m ] - 

0}-I;ioO' witlWo = 0. 

3) If Oi = 0, then o i+1 = n G n \ 7t u , iff 

a) 3t f > t s.t. D{{x{t'),y(t')),d{t')) C [tt], 

b) g [*,*>] s.t. D((jc(r),y(T)),d(T)) C [tt'], Vtt' g 

c) $<c g [*,*'] s.t. D((x(t),v(t)), d(z)) n [7T M ] ^ 
and u = min^ E p^.^ ] {^|D((x(0,y(0),^(0) £ 

W}-Ej=o^ with^o=0. 

4) If o,- = 0, then = n u , iff 

a) 3** > t s.t. D((*(0,y(0)^(0) n W ± 0, 

b) G [t,t>] S.t. D(WT),y(T)),d(T)) C M, W G 
IT\7r M , and 

and ti = min te[I i-j Qtj m {t\D((x(t),y(t)),d(t)) n [tt m ] ^ 

0}-I-=o^ with? = 0. 

• Let for KAt, o\ be the current satisfied proposition. Then 
ti = KAt-l!r = \tj. 

In Fig. 5 we show an uncertainty region and the corre- 
sponding trace generated according to rules stated above. 
Next, we show that if the trace corresponding to an uncer- 
tainty region satisfies 0, then any state (position) trajectory 
inside the uncertainty region also satisfies 0. 

Proposition 3: Let D((x(t),y(t)),d(t)) be the uncertainty 
region corresponding to a path through the MDP M (Eqn. 
([5])) and let q'(t) be any state trajectory as defined in Prop. [2] 
Let a D = (of ,*f ) . . . (<#,<£) and &i = (of ,t{ ) . . . (of ,tf ) 
be the corresponding traces. Given BLTL formula (Eqn. 
([|)), if o D ^(j), then a*' 1=0. 

Proof: First, we state two relations between the given traces: 

1) Let of = 71 G IT \ 7l u for some i G {1 , . . . , k}. Then, the 
following holds: 3j G {1,...,/} such that o q - = 71 and 

t D < ti 

Informally, if *P is the time D((x(^),y(^)), J(?)) spent 
inside the region satisfying proposition 7T, then q'{t) 
will spend at least tf time units inside that region. 

2) Let 6>f = 71 G IT \ 7T^ and of = 7T 7 G IT \ 7l u for some 

G {1,...,/:}, i' > i. Then, the following holds: 
3 7,/ G {1, • • . ,/}, / > j such that oi = 71 and e^, = 7T 7 . 



In addition, it" 1 rf' < Li'" 1 ff. 
Informally, if the time between D((x(t),y(t)),d(t)) 
entering a region satisfying n and then entering a 
region satisfying %' is Y!h=] *h ^ me UIUts ' men me ti me 
between q'(t) entering the region satisfying n and then 
entering the region satisfying 7t' is bounded from above 
by Y!h=] t h- ^ or more intuition about this relations see 
Fig. 5. 
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Fig. 5. An uncertainty region and a sample state (position) 
trajectory, inside the uncertainty region, are shown in black 
and magenta, respectively. The corresponding generated traces 
are ct d =(0, 5.72) (tt^, 1. 24) (0, 0.87) (X, 0.24) (0, 1.96) (7^,0.82) and 
a* = (0,5.59)(7Tp,1.45)(0,O.53)(7r n O.56)(0,1.62)(7rj,1.24). Let (j) be as 
given in Example 2. Then, it follows that o D N (j) and a q N (j). Note that 
for of = o| = 7T p , ?2 < *f ^ St relation above). Also, for of = cxf = ti p 
and of = o/ = tt ? , L-^?' < EjL=2*? (2 nd relation above). 

Assuming o D N 0, then Vj G { 1 , . . . , /}, 3ij,kj_e N and some 
such that ct? N 7 - (see Def. H). Then, from 
Prop. [2] and Def. [3] and [5} 'it follows that Vj G {1, . . . ,/}, 
3sj,Zj G N such that: 

2) for each sj < i < sj + Zj, o\ ^ 7T U , 

3) I*Q rl tf < l!/^ j ~ ] tP < Tj (2 nd relation above), 

4) tf.+ Zj > tf jJrk . > Ty (1 st relation above), 
where sy+i = Sj+Zj with .v i = 1 . 

Thus, Vj G {1, . . . ,/}, of. N 7 -, and according to Def. Q it 
follows that G q N 0. In Fig. 5 we give an example. ■ 

VIII. Vehicle Control Strategy 

Given the MDP M, the next step is to obtain a control 
policy that maximizes the probability of generating a path 
through M such that the corresponding trace (as defined in 
Sec. VI and VII) is satisfying. There are existing approaches 
that, given an MDP and a temporal logic formula, generate 
an exact control policy that maximizes the probability of 
satisfying the specification. In general, exact techniques rely 
on reasoning about the entire state space, which is a limiting 
factor in their applicability to large problems. Given U r , 
Ui, n r , n\ and K, the size of the MDP M is bounded 



above by (\U r \ x |C//| x n r x ni) K . Even for a simple case 
study, due to the size of M, using the exact methods to 
obtain a control policy is computationally too expensive. 
Therefore, we decide to trade-off correctness for scalability 
and use computationally efficient techniques based on system 
sampling. 

A. Overview 

We obtain a suboptimal control policy by iterating over the 
control synthesis and the probability estimation procedure 
until the stopping criterion is met (see Sec. VIII-C). In the 
control synthesis procedure we use the control synthesis 
approach from [HMZ + 12] to generate a control policy for the 
MDP M. In particular we use a control policy optimization 
part of the algorithm which consists of the control policy 
evaluation and the control policy improvement procedure to 
incrementally improve a candidate control policy (control 
policy is initialized with a uniform distribution at each 
state). Next, in the probability estimation procedure we use 
SMC by BIE, as presented in [ZPC10]. We estimate the 
probability that the MDP M, under the candidate control 
policy, generates a path such that the corresponding trace 
satisfies BLTL formula (j). Finally, if the estimated probability 
converges, i.e., if the stopping criterion is met, we map 
the control policy to a vehicle control strategy. Otherwise, 
the control synthesis procedure is restarted using the latest 
update of the control policy. The flow of this approach is 
depicted in Fig. 6. 

B. Control synthesis 

The details of the control policy optimization algorithm 
can be found in [HMZ+12] and here we only give an 
informal overview of the approach. In the control policy 
evaluation procedure we sample paths of the MDP M under 

a 1 a 2 

the current control policy jl. Given a path CO = so — > s\ — > 

S2...SK-1 sk, where a k = (u k r ,u k t ), the corresponding 
trace a is generated as described in Sec. VI and VII. Next, 
we check formula on each a and estimate how likely it is 
for each action to lead to the satisfaction of BLTL formula 
0, i.e., we obtain the estimate of the probability that a path 
crossing a state-action pair, k = 0, . . . ,K— 1, in CO 

will generate a trace that satisfies (j) . These estimates are then 
used in the control policy improvement procedure, in which 
we update the control policy jl by reinforcing the actions 
that led to the satisfaction of (f) most often. The authors 
([HMZ + 12]) show that the updated control policy is provably 
better than the previous one by focusing on more promising 
regions of the state space. 

The algorithm takes as input MDP M, BLTL formula (j) 
and the current control policy /i, together with the parameters 
of the algorithm (a greediness parameter < g < 1 , a history 
parameter < h < 1, and the number of sample paths in 
control policy evaluation procedure, denoted by N), and 
returns the updated probabilistic control policy fl. In the next 
step, to estimate the probability of satisfaction, we use the 
deterministic version of /i, denoted jl^et where: for all s G S 
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Fig. 6. Flow chart of the approach used to obtain the vehicle control 
strategy. 



and a G A, 

Hdet(s,a)=I{a = arg max aeAct{s) n(s,a)}. 

In words, we compute a control policy that always picks the 
best estimated action at each state. 

C Probability estimation 

Next, we determine the estimate of the probability that 
the MDP M, under the deterministic control policy /l^, 
generates a path such that the corresponding trace satisfies 
BLTL formula 0. To do so we use the BIE algorithm as 
presented in [ZPC10]. We denote the exact probability as 
Pm and the estimate as pM- 

The inputs of the algorithm are the MDP M, control 
policy ild et , BLTL formula 0, half interval size 5 G (0,^), 
interval coefficient c G (|,1), and the coefficients a,j8 of 
the Beta prior. The algorithm returns pM- The algorithm 
generates traces by sampling paths through M under \i^ et 
(as described in Sec. VI and VII) and checks whether 
the corresponding traces satisfy (j), until enough statistical 
evidence has been found to support the claim that pm is 
inside the interval [Pm — <5,/?m + <5] with arbitrarily high 
probability, i.e., Vx(p M G [Pm ~ §,Pm + $]) > c. 

We stop iterating over the control synthesis and the prob- 
ability estimation procedure when the difference between 
the two consecutive probability estimates converges to a 
neighborhood of radius e G (0,1), i.e., when the difference 



is smaller or equal to e. Let jl det and p* M be the current 
control policy and the corresponding probability estimate, 
respectively, when the stopping criterion is met. 

D. Control strategy 

The vehicle control strategy is a function 7: S — » {U n Ui} 
that maps a sequence of pairs of measured intervals, i.e., a 
state of the MDP, to the control inputs: 

/((W 1 , . . . , W*)) = Y(s k ) = arg m^x aeAct{sk) ^ et (s k ,a), (9) 

k=l,...,K-l with y(so) = arg max aeAct(so) ^ et (s ,a). 
At stage k, the control inputs are 

(u k r ,^)= 7 ((W l ,...,W k - 1 ))€{U r xU l }. 



Thus, given a sequence of pairs of measured intervals, 7 
returns the control inputs for the next stage; the control inputs 
are equal to the action returned by \i* det at the state of the 
MDP corresponding to that sequence. 

Theorem 1: The probability that the system given by Eqn. 
(1), under the vehicle control strategy 7, generates a state tra- 
jectory that satisfies BLTL formula (j) (Eqn. (2)) is bounded 
from below by p* M , where Pr(p* M G \p* M - S,p* M + 8]) > c. 
Proof: Let CO be a path through the MDP M and 
D({x(t),y(t)),d(t)) the corresponding uncertainty region as 
defined in Sec. VI. Let q'(t) be any state trajectory as defined 
in Prop. 2. Also, let G D and o q ' be the corresponding traces. 
Trace o D can (i) satisfy (f) and (ii) not satisfy (j). 

Let us first consider the former. If G D N (j) from Prop. 3 it 
follows that o q ' 1=0. Under 7 the probability of generating 
q'(t) is equivalent to generating path CO under fi^ er Since 
under \i* det the probability that a path through the MDP 
M generates a satisfying trace is p M it follows that the 
probability that the system given by Eqn. (1), under 7, will 
generate a satisfying state trajectory is also p M . 

To show that p M is the lower bound we need to consider 
the latter case. It is sufficient to observe that because of 
the conservative approximation of D((x(t),y(t)),d(t)) it is 
possible that a q ' satisfies 0, even though o D does not satisfy 
it. Therefore, it follows that the probability that system given 
by Eqn. (1), under the vehicle control strategy 7, generates 
a state trajectory that satisfies BLTL formula 0, is bounded 
from below by p* M . The rest of the proof, i.e., ¥*{p M G \p* M — 
8,p* M + 8\) > c, is given in [ZPC10]. ■ 

E. Complexity 

As stated above, the size of the MDP M is bounded above 
by (\U r \ x \Ui \ x n r x ni) K . Obviously, it can be expensive (in 
sense of memory usage) to store the whole MDP. Since our 
approach is sample-based, it is not necessary for the MDP to 
be constructed explicitly. Instead, a state of the MDP is stored 
only if it is sampled during the control synthesis procedure. 
As a result, during the execution, the number of states stored 
in the memory is bounded above by N x K x n, where n is 
the number of iterations between the control synthesis and 
the probability estimation procedures. 

The complexity analysis of the control synthesis part can 
be found in [HMZ + 12] and the complexity analysis of BIE 
algorithm can be found in [ZPC10]. 



IX. Case study 

We considered the system given by Eqn. (1) and we 
used the numerical values corresponding to Dr. Robot's 
x80Pro mobile robot equipped with two incremental en- 
coders. The parameters were r = 0.085m and L = 0.295m. 
To reduce the complexity, {U r x U{\ was limited to 
{(^^M^i),^, 1 ^)}, where the pairs of control 



inputs corresponded to a vehicle turning left at 



1 rad 

2 s 



1 rad 

2 s 



going 



respectively, when the 



straight, and turning right at 
forward speed is \ j . 

Measurement resolution: To obtain the angular wheel 
velocity, the frequency counting method [PTPZ07] was used, 
i.e., the encoder pulses inside a given sampling period 
were counted. The number of pulses per revolution (i.e., 
the number of windows in the code track of the encoders) 
was 378 and the sampling period was set to At = 2.6s. 
Thus, according to [PTPZ07] the measurement resolution 
was Ae r Ae/ = y^T6 ~ 0-0064. 

Probability density functions: We obtained the distribu- 
tions through experimental trials. Specifically, we used con- 
trol inputs from {U r x U{\ as the robot inputs and then mea- 
sured the actual angular wheel velocities using the encoders. 
We obtained £f in (£^ ax ) by taking the minimum (maximum) 
over {£_},. . . ,e* } ({e}, . . . , ef }), where [sj^sj], j e {1, . . . ,*}, 
i G {/,/}, was the noise interval, of length Ae ; , determined 
from the j-th measurement of the encoder i and k was the 
total number of measurements. Note that m 



i G {/",/}. Finally, the probabilities for Eq. (4) that defined 
the probability density functions, were equal to the number 
of times a particular noise interval was measured over k. For 
k = 150 (i.e., by using each control input from {U r x £//} 50 
times) we obtained -£™ in = £™ ax = -ef n = zf ax = 0.0096 
and the corresponding probabilities. 

The set of propositions was IT = {7^,7^,7^1,^2?%} 
where 7T M , K p , 7l t \, 7^2, 7^ labeled the unsafe, pick-up, 
testl, test 2 and the drop-off regions, respectively. 
The motion specification was: 

Start from an initial state qi n i t and reach a pick-up 
region within 14 time units and stay in it at least 0.8 time 
units, to pick-up the load. After entering the pick-up 
region, reach a testl region within 5 time units and stay 
in it at least 1 time units or reach a test 2 region within 
5 time units and stay in it at least 0.8 time units. Finally, 
after entering the testl region or the test 2 region reach 
a drop-off region within 4 time units to drop off the load. 
Always avoid the unsafe regions. 

The specification translates to BLTL formula 

T<5/ 



n7T M AU^ 14 (G^ as 7r A- 



[G- 1 ^! VG-°' 8 7T r 2] A-i7Z u \J- 4 7ld))- 



(10) 



Two different environments are shown in Fig. 7. The 
estimated probability p* M corresponding to environment A 
and B was 0.664 and 0.719, respectively. From Eq. (TT0j> it 
followed that K = 9. The numerical values in the control 
synthesis procedure and the probability estimation procedure 
were as follows: N = 10000, h = 0.6, g = 0.6, 5 = 0.05, 



c = 0.95, a = j3 = 1, and e = 0.05. For both environments, 
we found the vehicle control strategy through the method 
described in Sec. VIII. 




Fig. 7. 20 sample state (position) trajectories for cases A and B (to 
be read top-to-bottom). The unsafe, pick-up, testl, test 2, and 
the drop-off regions are shown in red, blue, cyan, yellow and green, 
respectively. Satisfying and violating trajectories are shown in black and 
red, respectively. Note that, in case A, the upper two red trajectories avoid 
the unsafe regions and visit the pick-up, test2, and the drop-off 
region in the correct order, but they violate the specification because they 
do not stay long enough in the test 2 region. 



Since it is not possible to obtain the exact probability 
that the system given by Eqn. (1), under the vehicle control 
strategy, generates a satisfying state trajectory, in order to 
verify our result (Theorem 1), we performed multiple runs 
of BIE algorithm by simulating the system under the vehicle 
control strategy (using the same numerical values as stated 
above and by generating traces as described in Sec. IV). 
We denote the resulting probability estimate as ps and we 
compare it to p* M . 

TABLE I 

Probability estimates of satisfying the specification 



Environment 


Pm 


PS 


Run 1 


Run 2 


Run 3 


A 


0.664 


0.847 


0.832 


0.826 


B 


0.719 


0.891 


0.898 


0.879 



In Fig. 7 we show sample state trajectories and in Table [I] 
we compare the estimated probabilities obtained on the MDP, 
Pm, with the estimated probabilities obtained by simulating 
the system, ps. The results support Theorem 1, since ps 
is bounded from below by p* M . The discrepancy in the 
probabilities is mostly due to the conservative approximation 



of the uncertainty region in Sec. VI. The Matlab code used 
to obtain the vehicle control strategy ran for approximately 
2.2 hours on a computer with a 2.5GHz dual processor. 

In Fig. 8 we show a sample run of the robot in environment 
A. A projector was used to display the environment and the 
state (position) trajectory was reconstructed using the Op- 
tiTrack (http://www.naturalpoint.com/optitrack) system with 
eight cameras. 




Fig. 8. Snapshots (to be read top-to-bottom) from a movie (available online 
at http://people.bu.edu/icizelj/Igor_Cizelj/diff-bltl.html ) showing a robot 
motion produced by applying the vehicle control strategy for environment 
A. The generated trajectory satisfied (j) (Eq. flO)). 



X. Conclusion 

We developed a feedback control strategy for a stochastic 
differential drive mobile robot such that the probability of 
satisfying a time constrained specification given in terms of 
a temporal logic statement is maximized. By mapping sensor 
measurements to a Markov Decision Process (MDP) we 
translate the problem to finding a control policy maximizing 
the probability of satisfying a Bounded Linear Temporal 
Logic (BLTL) formula on the MDP. The solution is based on 
Statistical Model Checking (SMC) for MDPs and we show 
that the probability that the vehicle satisfies the specification 



is bounded from below by the probability of satisfying the 
specification on the MDP. 

The key limitation of the proposed approach is the com- 
putation time. Since sampling accounts for the majority of 
our runtime, future work includes improving the sampling 
performance and making the implementation fully parallel, 
as well as developing a less conservative uncertainty model. 
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